Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
我们研究语言模型是否可以评估自己主张的有效性,并预测他们能够正确回答的问题。我们首先表明,当以正确的格式提供时,较大的模型在多样化的多项选择和True/False问题上进行了很好的校准。因此,我们可以通过要求模型首先提出答案,然后评估其答案正确的概率“ p(true)”来对开放式采样任务进行自我评估。我们发现在各种任务中,P(true)的表现,校准和缩放令人鼓舞。当我们允许模型考虑自己的许多样本之前,在预测一种特定可能性的有效性之前,自我评估的性能进一步改善。接下来,我们研究是否可以培训模型来预测“ P(ik)”,即“我知道”问题的概率,而无需参考任何特定提出的答案。模型在预测P(IK)方面表现良好,并且在跨任务中部分概括,尽管它们在新任务上的P(IK)校准方面遇到了困难。预测的p(IK)概率在存在相关的原始材料的情况下以及对数学单词问题解决方案的提示也适当增加。我们希望这些观察结果为培训更诚实的模型提供了基础,并研究了诚实对模型模仿人类写作以外的其他目标培训的案例的普遍性。
translated by 谷歌翻译
由于生产和依赖数据集以产生自动化决策系统(广告)增加,因此需要评估和询问底层数据的过程。在2018年启动数据集营养标签后,数据营养项目已对标签的设计和目的进行了重大更新,并在2020年代后期推出更新的标签,该标签在本文中预览。新标签包括通过针对数据科学家配置文件的更新的设计和用户界面提供的上下文专用用例和警报。本文讨论了标签旨在减轻的潜在培训数据的危害和偏见,包括标记,新的和现有挑战以及工作的进一步方向,以及预览新的新数据集标签。
translated by 谷歌翻译
The recent increase in public and academic interest in preserving biodiversity has led to the growth of the field of conservation technology. This field involves designing and constructing tools that utilize technology to aid in the conservation of wildlife. In this article, we will use case studies to demonstrate the importance of designing conservation tools with human-wildlife interaction in mind and provide a framework for creating successful tools. These case studies include a range of complexities, from simple cat collars to machine learning and game theory methodologies. Our goal is to introduce and inform current and future researchers in the field of conservation technology and provide references for educating the next generation of conservation technologists. Conservation technology not only has the potential to benefit biodiversity but also has broader impacts on fields such as sustainability and environmental protection. By using innovative technologies to address conservation challenges, we can find more effective and efficient solutions to protect and preserve our planet's resources.
translated by 谷歌翻译
As various city agencies and mobility operators navigate toward innovative mobility solutions, there is a need for strategic flexibility in well-timed investment decisions in the design and timing of mobility service regions, i.e. cast as "real options" (RO). This problem becomes increasingly challenging with multiple interacting RO in such investments. We propose a scalable machine learning based RO framework for multi-period sequential service region design & timing problem for mobility-on-demand services, framed as a Markov decision process with non-stationary stochastic variables. A value function approximation policy from literature uses multi-option least squares Monte Carlo simulation to get a policy value for a set of interdependent investment decisions as deferral options (CR policy). The goal is to determine the optimal selection and timing of a set of zones to include in a service region. However, prior work required explicit enumeration of all possible sequences of investments. To address the combinatorial complexity of such enumeration, we propose a new variant "deep" RO policy using an efficient recurrent neural network (RNN) based ML method (CR-RNN policy) to sample sequences to forego the need for enumeration, making network design & timing policy tractable for large scale implementation. Experiments on multiple service region scenarios in New York City (NYC) shows the proposed policy substantially reduces the overall computational cost (time reduction for RO evaluation of > 90% of total investment sequences is achieved), with zero to near-zero gap compared to the benchmark. A case study of sequential service region design for expansion of MoD services in Brooklyn, NYC show that using the CR-RNN policy to determine optimal RO investment strategy yields a similar performance (0.5% within CR policy value) with significantly reduced computation time (about 5.4 times faster).
translated by 谷歌翻译
The combination of conduct, emotion, motivation, and thinking is referred to as personality. To shortlist candidates more effectively, many organizations rely on personality predictions. The firm can hire or pick the best candidate for the desired job description by grouping applicants based on the necessary personality preferences. A model is created to identify applicants' personality types so that employers may find qualified candidates by examining a person's facial expression, speech intonation, and resume. Additionally, the paper emphasises detecting the changes in employee behaviour. Employee attitudes and behaviour towards each set of questions are being examined and analysed. Here, the K-Modes clustering method is used to predict employee well-being, including job pressure, the working environment, and relationships with peers, utilizing the OCEAN Model and the CNN algorithm in the AVI-AI administrative system. Findings imply that AVIs can be used for efficient candidate screening with an AI decision agent. The study of the specific field is beyond the current explorations and needed to be expanded with deeper models and new configurations that can patch extremely complex operations.
translated by 谷歌翻译
Correct scoring of a driver's risk is of great significance to auto insurance companies. While the current tools used in this field have been proven in practice to be quite efficient and beneficial, we argue that there is still a lot of room for development and improvement in the auto insurance risk estimation process. To this end, we develop a framework based on a combination of a neural network together with a dimensionality reduction technique t-SNE (t-distributed stochastic neighbour embedding). This enables us to visually represent the complex structure of the risk as a two-dimensional surface, while still preserving the properties of the local region in the features space. The obtained results, which are based on real insurance data, reveal a clear contrast between the high and low risk policy holders, and indeed improve upon the actual risk estimation performed by the insurer. Due to the visual accessibility of the portfolio in this approach, we argue that this framework could be advantageous to the auto insurer, both as a main risk prediction tool and as an additional validation stage in other approaches.
translated by 谷歌翻译
As language models (LMs) scale, they develop many novel behaviors, good and bad, exacerbating the need to evaluate how they behave. Prior work creates evaluations with crowdwork (which is time-consuming and expensive) or existing data sources (which are not always available). Here, we automatically generate evaluations with LMs. We explore approaches with varying amounts of human effort, from instructing LMs to write yes/no questions to making complex Winogender schemas with multiple stages of LM-based generation and filtering. Crowdworkers rate the examples as highly relevant and agree with 90-100% of labels, sometimes more so than corresponding human-written datasets. We generate 154 datasets and discover new cases of inverse scaling where LMs get worse with size. Larger LMs repeat back a dialog user's preferred answer ("sycophancy") and express greater desire to pursue concerning goals like resource acquisition and goal preservation. We also find some of the first examples of inverse scaling in RL from Human Feedback (RLHF), where more RLHF makes LMs worse. For example, RLHF makes LMs express stronger political views (on gun rights and immigration) and a greater desire to avoid shut down. Overall, LM-written evaluations are high-quality and let us quickly discover many novel LM behaviors.
translated by 谷歌翻译
We present Azimuth, an open-source and easy-to-use tool to perform error analysis for text classification. Compared to other stages of the ML development cycle, such as model training and hyper-parameter tuning, the process and tooling for the error analysis stage are less mature. However, this stage is critical for the development of reliable and trustworthy AI systems. To make error analysis more systematic, we propose an approach comprising dataset analysis and model quality assessment, which Azimuth facilitates. We aim to help AI practitioners discover and address areas where the model does not generalize by leveraging and integrating a range of ML techniques, such as saliency maps, similarity, uncertainty, and behavioral analyses, all in one tool. Our code and documentation are available at github.com/servicenow/azimuth.
translated by 谷歌翻译
Machine learning (ML) has found broad applicability in quantum information science in topics as diverse as experimental design, state classification, and even studies on quantum foundations. Here, we experimentally realize an approach for defining custom prior distributions that are automatically tuned using ML for use with Bayesian quantum state estimation methods. Previously, researchers have looked to Bayesian quantum state tomography due to its unique advantages like natural uncertainty quantification, the return of reliable estimates under any measurement condition, and minimal mean-squared error. However, practical challenges related to long computation times and conceptual issues concerning how to incorporate prior knowledge most suitably can overshadow these benefits. Using both simulated and experimental measurement results, we demonstrate that ML-defined prior distributions reduce net convergence times and provide a natural way to incorporate both implicit and explicit information directly into the prior distribution. These results constitute a promising path toward practical implementations of Bayesian quantum state tomography.
translated by 谷歌翻译